{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Timeseries Forecasting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook explains how to use `tsfresh` in time series foreacasting.\n",
    "Make sure you also read through the [documentation](https://tsfresh.readthedocs.io/en/latest/text/forecasting.html) to learn more on this feature.\n",
    "\n",
    "It is basically a copy of the other time series forecasting notebook, but this time using more than one \n",
    "stock.\n",
    "This is conceptionally not much different, but the pandas multi-index magic is a bit advanced :-)\n",
    "\n",
    "We will use the Google, Facebook and Alphabet stock.\n",
    "Please find all documentation in the other notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pylab as plt\n",
    "\n",
    "from tsfresh import extract_features, select_features\n",
    "from tsfresh.utilities.dataframe_functions import roll_time_series, make_forecasting_frame\n",
    "from tsfresh.utilities.dataframe_functions import impute\n",
    "\n",
    "try:\n",
    "    import pandas_datareader.data as web\n",
    "except ImportError:\n",
    "    print(\"You need to install the pandas_datareader. Run pip install pandas_datareader.\")\n",
    "\n",
    "from sklearn.ensemble import AdaBoostRegressor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = web.DataReader(['F', \"AAPL\", \"GOOGL\"], 'stooq')[\"High\"]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(15, 6))\n",
    "df.plot(ax=plt.gca())\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time we need to make sure to preserve the stock symbol information while reordering:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_melted = df.copy()\n",
    "df_melted[\"date\"] = df_melted.index\n",
    "df_melted = df_melted.melt(id_vars=\"date\", value_name=\"high\").sort_values([\"Symbols\", \"date\"])\n",
    "df_melted = df_melted[[\"Symbols\", \"date\", \"high\"]]\n",
    "\n",
    "df_melted.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create training data sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_rolled = roll_time_series(df_melted, column_id=\"Symbols\", column_sort=\"date\",\n",
    "                             max_timeshift=20, min_timeshift=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_rolled.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X = extract_features(df_rolled.drop(\"Symbols\", axis=1), \n",
    "                     column_id=\"id\", column_sort=\"date\", column_value=\"high\", \n",
    "                     impute_function=impute, show_warnings=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We make the data a bit easier to work with by giving them a multi-index instead ot the tuple index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# split up the two parts of the index and give them proper names\n",
    "X = X.set_index([X.index.map(lambda x: x[0]), X.index.map(lambda x: x[1])], drop=True)\n",
    "X.index.names = [\"Symbols\", \"last_date\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our `(AAPL, 2015-07-15 00:00:00)` is also in the data again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.loc[\"AAPL\", pd.to_datetime('2015-07-15')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Just to repeat: the features in this row were only calculated using the time series values of `AAPL` up to and including `2015-07-15` and the last 20 days."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next line might look like magic if you are not used to pandas transformations, but what it does is:\n",
    "\n",
    "for each stock symbol separately:\n",
    "* sort by date\n",
    "* take the high value\n",
    "* shift 1 time step in the future\n",
    "* bring into the same multi-index format as `X` above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = df_melted.groupby(\"Symbols\").apply(lambda x: x.set_index(\"date\")[\"high\"].shift(-1)).T.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Quick consistency test:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y[\"AAPL\", pd.to_datetime(\"2015-07-15\")], df.loc[pd.to_datetime(\"2015-07-16\"), \"AAPL\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = y[y.index.isin(X.index)]\n",
    "X = X[X.index.isin(y.index)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The splitting into train and test samples workes in principle the same as with a single identifier, but this time we have a multi-index symbol-date, so the `loc` call looks a bit more complicated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = X.loc[(slice(None), slice(None, \"2018\")), :]\n",
    "X_test = X.loc[(slice(None), slice(\"2019\", \"2020\")), :]\n",
    "\n",
    "y_train = y.loc[(slice(None), slice(None, \"2018\"))]\n",
    "y_test = y.loc[(slice(None), slice(\"2019\", \"2020\"))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_selected = select_features(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are training a regressor for each of the stocks separately"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adas = {stock: AdaBoostRegressor() for stock in [\"AAPL\", \"F\", \"GOOGL\"]}\n",
    "\n",
    "for stock, ada in adas.items():\n",
    "    ada.fit(X_train_selected.loc[stock], y_train.loc[stock])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets check again how good our prediction is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_selected = X_test[X_train_selected.columns]\n",
    "\n",
    "y_pred = pd.concat({\n",
    "    stock: pd.Series(adas[stock].predict(X_test_selected.loc[stock]), index=X_test_selected.loc[stock].index)\n",
    "    for stock in adas.keys()\n",
    "})\n",
    "y_pred.index.names = [\"Symbols\", \"last_date\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(15, 6))\n",
    "\n",
    "y.unstack(\"Symbols\").plot(ax=plt.gca())\n",
    "y_pred.unstack(\"Symbols\").shift(-1).plot(ax=plt.gca(), legend=None, marker=\".\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
